Skip to main content

4.2 - Design - Expanding the State and Action Space

To graduate our agent from a single-purpose actor to a decision-maker, we must evolve its environment. This involves expanding its "senses" (the observation space) and its "abilities" (the action space) to provide the necessary tools for learning the more complex task of balancing economic priorities.

This document details the design changes from the WorkerEnv to the new MacroEnv.

Agent Requirements Analysis

The new task requires the agent to:

  • Sense when its supply capacity is becoming a limiting factor.
  • Possess the ability to increase that supply capacity.

This mandates a direct evolution of our environment's API.


1. Observation Space Evolution

The agent needs a more complete picture of its supply situation than just supply_left. We will provide the raw components of supply to allow the agent to learn the relationship itself.

Design Comparison:

  • WorkerEnv Observation Space - Box(3,)

    IndexFeature
    0Minerals
    1Worker Count
    2Supply Left
  • MacroEnv Observation Space - Box(4,)

    IndexFeatureRationale for Change
    0Minerals(Unchanged) Required for affordability checks.
    1Worker Count(Unchanged) Required as a progress metric.
    2supply_used(New) Provides the "demand" side of the supply equation.
    3supply_cap(New) Provides the "supply" side of the equation.

Rationale: Providing supply_used and supply_cap separately is a more robust design. It gives the agent the raw data and allows the neural network to learn the concept of "supply pressure" on its own, which can lead to a more nuanced and effective policy.


2. Action Space Evolution

To meet the new requirements, the agent must be given the ability to build a supply structure.

Design Comparison:

  • WorkerEnv Action Space - Discrete(2)

    ActionMeaning
    0Do Nothing
    1Build Worker
  • MacroEnv Action Space - Discrete(3)

    ActionMeaningRationale for Change
    0Do Nothing(Unchanged) A passive choice is always required.
    1Build Worker(Unchanged) The primary economic action.
    2Build Supply(New) Directly provides the agent with the tool needed to solve the new problem.

These modifications transform the environment from a simple, single-goal task to a more dynamic system where the agent must learn to prioritize actions based on a richer set of inputs.